home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Whiteline: delta
/
whiteline CD Series - delta.iso
/
tools
/
utils
/
sgmlstos
/
sgmls.txt
< prev
next >
Wrap
Text File
|
1995-11-25
|
23KB
|
661 lines
SGMLS(1) SGMLS(1)
NAME
sgmls - a validating SGML parser
An SGML System Conforming to
International Standard ISO 8879 --
Standard Generalized Markup Language
SYNOPSIS
sgmls [ -deglprsuv ] [ -cfile ] [ -iname ] [ filename...
]
DESCRIPTION
Sgmls parses and validates the SGML document entity in
filename... and prints on the standard output a simple
ASCII representation of its Element Structure Information
Set. (This is the information set which a structure-
controlled conforming SGML application should act upon.)
Note that the document entity may be spread amongst sev-
eral files; for example, the SGML declaration, document
type declaration and document instance set could each be
in a separate file. If no filenames are specified, then
sgmls will read the document entity from the standard
input. A filename of - can also be used to refer to the
standard input.
The following options are available:
-cfile Write a report of capacity usage to file. The
report is in the format of a RACT result. RACT is
the Reference Application for Capacity Testing
defined in the Proposed American National Standard
Conformance Testing for Standard Generalized Markup
Language (SGL) Systems (X3.190-199X), Draft July
1991.
-d Warn about duplicate entity declarations.
-e Describe open entities in error messages. Error
messages always include the position of the most
recently opened external entity.
-g Show the GIs of open elements in error messages.
-iname Pretend that
<!ENTITY % name "INCLUDE">
occurs at the start of the document type declara-
tion subset in the SGML document entity. Since
repeated definitions of an entity are ignored, this
definition will take precedence over any other def-
initions of this entity in the document type decla-
ration. Multiple -i options are allowed. If the
SGML declaration replaces the reserved name INCLUDE
1
SGMLS(1) SGMLS(1)
then the new reserved name will be the replacement
text of the entity. Typically the document type
declaration will contain
<!ENTITY % name "IGNORE">
and will use %name; in the status keyword specifi-
cation of a marked section declaration. In this
case the effect of the option will be to cause the
marked section not to be ignored.
-l Output L commands giving the current line number
and filename.
-p Parse only the prolog. Sgmls will exit after pars-
ing the document type declaration. Implies -s.
-r Warn about defaulted references.
-s Suppress output. Error messages will still be
printed.
-u Warn about undefined elements: elements used in the
DTD but not defined. Also warn about undefined
short reference maps.
-v Print the version number.
Entity Manager
An external entity resides in one or more files. The
entity manager component of sgmls maps a sequence of files
into an entity in three sequential stages:
1. each carriage return character is turned into a
non-SGML character;
2. each newline character is turned into a record end
character, and at the same time a record start
character is inserted at the beginning of each
line;
3. the files are concatenated.
A system identifier is interpreted as a list of filenames
separated by colons. A filename of - can be used to refer
to the standard input. If no system identifier is sup-
plied, then the entity manager will attempt to generate a
filename using the public identifier (if there is one) and
other information available to it. Notation identifiers
are not subject to this treatment. This process is con-
trolled by the environment variable SGML_PATH; this con-
tains a colon-separated list of filename templates. A
filename template is a filename that may contain substitu-
tion fields; a substitution field is a % character
2
SGMLS(1) SGMLS(1)
followed by a single letter that indicates the value of
the substitution. If SGML_PATH uses the %S field (the
value of which is the system identifier), then the entity
manager will also use SGML_PATH to generate a filename
when a system identifier that does not contain any colons
is supplied. The value of a substitution can either be a
string or it can be null. The entity manager transforms
the list of filename templates into a list of filenames by
substituting for each substitution field and discarding
any template that contained a substitution field whose
value was null. It then uses the first resulting filename
that exists and is readable. Substitution values are
transformed before being used for substitution: firstly,
any names that were subject to upper case substitution are
folded to lower case; secondly, space characters are
mapped to underscores and slashes are mapped to percents.
The value of the %S field is not transformed. The values
of substitution fields are as follows:
%% A single %.
%D The entity's data content notation. This substitu-
tion will succeed only for external data entities.
%N The entity, notation or document type name.
%P The public identifier if there was a public identi-
fier, otherwise null.
%S The system identifier if there was a system identi-
fier otherwise null.
%X (This is provided mainly for compatibility with
ARCSGML.) A three-letter string chosen as follows:
| |
| | With public identifier
| +-------------+-----------
| No public | Device | Device
| identifier | independent | dependent
---------------------------+------------+-------------+-----------
Data or subdocument entity | nsd | pns | vns
General SGML text entity | gml | pge | vge
Parameter entity | spe | ppe | vpe
Document type definition | dtd | pdt | vdt
Link process definition | lpd | plp | vlp
The device dependent version is selected if the
public text class allows a public text display ver-
sion but no public text display version was speci-
fied.
%Y The type of thing for which the filename is being
generated:
3
SGMLS(1) SGMLS(1)
SGML subdocument entity sgml
Data entity data
General text entity text
Parameter entity parm
Document type definition dtd
Link process definition lpd
The value of the following substitution fields will be
null unless a valid formal public identifier was supplied.
%A Null if the text identifier in the formal public
identifier contains an unavailable text indicator,
otherwise the empty string.
%C The public text class, mapped to lower case.
%E The public text designating sequence (escape
sequence) if the public text class is CHARSET, oth-
erwise null.
%I The empty string if the owner identifier in the
formal public identifier is an ISO owner identi-
fier, otherwise null.
%L The public text language, mapped to lower case,
unless the public text class is CHARSET, in which
case null.
%O The owner identifier (with the +// or -// prefix
stripped.)
%R The empty string if the owner identifier in the
formal public identifier is a registered owner
identifier, otherwise null.
%T The public text description.
%U The empty string if the owner identifier in the
formal public identifier is an unregistered owner
identifier, otherwise null.
%V The public text display version. This substitution
will be null if the public text class does not
allow a display version or if no version was speci-
fied. If an empty version was specified, a value
of default will be used.
4
SGMLS(1) SGMLS(1)
System declaration
The system declaration for sgmls is as follows:
SYSTEM "ISO 8879:1986"
CHARSET
BASESET "ISO 646-1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 128 0
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
FEATURES
MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
OTHER CONCUR NO SUBDOC YES 1 FORMAL YES
SCOPE DOCUMENT
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Reference//EN"
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Core//EN"
VALIDATE
GENERAL YES MODEL YES EXCLUDE YES CAPACITY YES
NONSGML YES SGML YES FORMAL YES
SDIF
PACK NO UNPACK NO
The memory usage of sgmls is not a function of the capac-
ity points used by a document; however, sgmls can handle
capacities significantly greater than the reference capac-
ity set.
In some environments, higher values may be supported for
the SUBDOC parameter.
Documents that do not use optional features are also sup-
ported. For example, if FORMAL NO is specified in the
SGML declaration, public identifiers will not be required
to be valid formal public identifiers.
Certain parts of the concrete syntax may be changed:
The shunned character numbers can be changed.
Eight bit characters can be assigned to LCNMSTRT,
UCNMSTRT, LCNMCHAR and UCNMCHAR. Declaring this
requires that the syntax reference character set be
declared like this:
BASESET "ISO Registration Number 100//CHARSET
ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET 0 256 0
Uppercase substitution can be performed or not per-
formed both for entity names and for other names.
Either short reference delimiters assigned by the
reference delimiter set or no short reference
delimiters are supported.
5
SGMLS(1) SGMLS(1)
The reserved names can be changed.
The quantity set can be increased within certain
limits subject to there being sufficient memory
available. The upper limit on NAMELEN is 239. The
upper limits on ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL,
LITLEN, PILEN, TAGLEN, and TAGLVL are more than
thirty times greater than the reference limits.
The upper limit on GRPCNT, GRPGTCNT, and GRPLVL is
253. NORMSEP cannot be changed. DTAGLEN are
DTEMPLEN irrelevant since sgmls does not support
the DATATAG feature.
SGML declaration
The SGML declaration may be omitted, the following decla-
ration will be implied:
<!SGML "ISO 8879:1986"
CHARSET
BASESET "ISO 646-1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
SCOPE DOCUMENT
SYNTAX PUBLIC "ISO 8879:1986//SYNTAX Reference//EN"
FEATURES
MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
LINK SIMPLE NO IMPLICIT NO EXPLICIT NO
OTHER CONCUR NO SUBDOC YES 99999999 FORMAL YES
APPINFO NONE>
with the exception that characters 128 through 254 will be
assigned to DATACHAR. When exporting documents that use
characters in this range, an accurate description of the
upper half of the document character set should be added
to this declaration. For ISO Latin-1, an appropriate
description would be:
BASESET "ISO Registration Number 100//CHARSET
ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET 128 32 UNUSED
160 95 32
255 1 UNUSED
Output format
The output is a series of lines. Lines can be arbitrarily
long. Each line consists of an initial command character
and one or more arguments. Arguments are separated by a
single space, but when a command takes a fixed number of
arguments the last argument can contain spaces. There is
no space between the command character and the first
6
SGMLS(1) SGMLS(1)
argument. Arguments can contain the following escape
sequences.
\\ A \.
\n A record end character.
\| Internal SDATA entities are bracketed by these.
\nnn The character whose code is nnn octal.
A record start character will be represented by \012.
Most applications will need to ignore \012 and translate
\n into newline.
The possible command characters and arguments are as fol-
lows:
(gi The start of an element whose generic identifier is
gi. Any attributes for this element will have been
specified with A commands.
)gi The end an element whose generic identifier is gi.
-data Data.
&name A reference to an external data entity name; name
will have been defined using an E command.
?pi A processing instruction with data pi.
Aname val
The next element to start has an attribute name
with value val which takes one of the following
forms:
IMPLIED
The value of the attribute is implied.
CDATA data
The attribute is character data. This is
used for attributes whose declared value is
CDATA.
NOTATION nname
The attribute is a notation name; nname will
have been defined using a N command. This
is used for attributes whose declared value
is NOTATION.
ENTITY name...
The attribute is a list of general entity
names. Each entity name will have been
defined using an I, E or S command. This is
7
SGMLS(1) SGMLS(1)
used for attributes whose declared value is
ENTITY or ENTITIES.
TOKEN token...
The attribute is a list of tokens. This is
used for attributes whose declared value is
anything else.
Dename name val
This is the same as the A command, except that it
specifies a data attribute for an external entity
named ename. Any D commands will come after the E
command that defines the entity to which they
apply, but before any & or A commands that refer-
ence the entity.
Nnname nname. Define a notation This command will be pre-
ceded by a p command if the notation was declared
with a public identifier, and by a s command if the
notation was declared with a system identifier. A
notation will only be defined if it is to be refer-
enced in an E command or in an A command for an
attribute with a declared value of NOTATION.
Eename typ nname
Define an external data entity named ename with
type typ (CDATA, NDATA or SDATA) and notation not.
This command will be preceded by one or more f com-
mands giving the filenames generated by the entity
manager from the system and public identifiers, by
a p command if a public identifier was declared for
the entity, and by a s command if a system identi-
fier was declared for the entity. not will have
been defined using a N command. Data attributes
may be specified for the entity using D commands.
An external data entity will only be defined if it
is to be referenced in a & command or in an A com-
mand for an attribute whose declared value is
ENTITY or ENTITIES.
Iename typ text
Define an internal data entity named ename with
type typ (CDATA or SDATA) and entity text text. An
internal data entity will only be defined if it is
referenced in an A command for an attribute whose
declared value is ENTITY or ENTITIES.
Sename Define a subdocument entity named ename. This com-
mand will be preceded by one or more f commands
giving the filenames generated by the entity man-
ager from the system and public identifiers, by a p
command if a public identifier was declared for the
entity, and by a s command if a system identifier
was declared for the entity. A subdocument entity
8
SGMLS(1) SGMLS(1)
will only be defined if it is referenced in a {
command or in an A command for an attribute whose
declared value is ENTITY or ENTITIES.
ssysid This command applies to the next E, S or N command
and specifies the associated system identifier.
ppubid This command applies to the next E, S or N command
and specifies the associated public identifier.
ffilename
This command applies to the next E or S command and
specifies an associated filename. There will be
more than one f command for a single E or S command
if the system identifier used a colon.
{ename The start of the SGML subdocument entity ename;
ename will have been defined using a S command.
}ename The end of the SGML subdocument entity ename.
Llineno file
Llineno
Set the current line number and filename. The
filename argument will be omitted if only the line
number has changed. This will be output only if
the -l option has been given.
#text An APPINFO parameter of text was specified in the
SGML declaration. This is not strictly part of the
ESIS, but a structure-controlled application is
permitted to act on it. No # command will be out-
put if APPINFO NONE was specified. A # command
will occur at most once, and may be preceded only
by a single L command.
C This command indicates that the document was a con-
forming SGML document. If this command is output,
it will be the last command. An SGML document is
not conforming if it references a subdocument
entity that is not conforming.
BUGS
Some non-SGML characters in literals are counted as two
characters for the purposes of quantity and capacity cal-
culations.
SEE ALSO
The SGML Handbook, Charles F. Goldfarb
ISO 8879 (Standard Generalized Markup Language), Interna-
tional Organization for Standardization
ORIGIN
ARCSGML was written by Charles F. Goldfarb.
9
SGMLS(1) SGMLS(1)
Sgmls was derived from ARCSGML by James Clark
(jjc@jclark.com), to whom bugs should be reported.
10